GenomeScope: fast reference-free genome profiling from short reads

نویسندگان

  • Gregory W. Vurture
  • Fritz J. Sedlazeck
  • Maria Nattestad
  • Charles J. Underwood
  • Han Fang
  • James Gurtowski
  • Michael C. Schatz
چکیده

Summary GenomeScope is an open-source web tool to rapidly estimate the overall characteristics of a genome, including genome size, heterozygosity rate and repeat content from unprocessed short reads. These features are essential for studying genome evolution, and help to choose parameters for downstream analysis. We demonstrate its accuracy on 324 simulated and 16 real datasets with a wide range in genome sizes, heterozygosity levels and error rates. Availability and Implementation http://genomescope.org , https://github.com/schatzlab/genomescope.git . Contact [email protected]. Supplementary information Supplementary data are available at Bioinformatics online.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Reference-free prediction of rearrangement breakpoint reads

MOTIVATION Chromosome rearrangement events are triggered by atypical breaking and rejoining of DNA molecules, which are observed in many cancer-related diseases. The detection of rearrangement is typically done by using short reads generated by next-generation sequencing (NGS) and combining the reads with knowledge of a reference genome. Because structural variations and genomes differ from one...

متن کامل

An Ultra-fast Approach to Align Longer Short Reads onto Human Genome

With the advent of second-generation sequencing (SGS) technologies, deoxyribonucleic acid (DNA) sequencing machines have started to produce reads, named as “longer short reads”, which are much longer than previous generation reads, the so called “short reads”. Unfortunately, most of the existing read aligners do not scale well for those second-generation longer short reads. Moreover, many of th...

متن کامل

Impact of Gene Annotation on RNA-seq Data Analysis

RNA-seq has become increasingly popular in transcriptome profiling. One of the major challenges in RNA-seq data analysis is the accurate mapping of junction reads to their genomic origins. To detect splicing sites in short reads, many RNA-seq aligners use reference transcriptome to inform placement of junction reads. However, no systematic evaluation has been performed to assess or quantify the...

متن کامل

A fast and efficient algorithm for mapping short sequences to a reference genome.

Novel high-throughput (Deep) sequencing technology methods have redefined the way genome sequencing is performed. They are able to produce tens of millions of short sequences (reads) in a single experiment and with a much lower cost than previous sequencing methods. In this paper, we present a new algorithm for addressing the problem of efficiently mapping millions of short reads to a reference...

متن کامل

Mapping Accuracy of Short Reads from Massively Parallel Sequencing and the Implications for Quantitative Expression Profiling

BACKGROUND Massively parallel sequencing offers an enormous potential for expression profiling, in particular for interspecific comparisons. Currently, different platforms for massively parallel sequencing are available, which differ in read length and sequencing costs. The 454-technology offers the highest read length. The other sequencing technologies are more cost effective, on the expense o...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Bioinformatics

دوره 33 14  شماره 

صفحات  -

تاریخ انتشار 2017